Objectives

  • Introduce myself
  • Identify major course objectives
  • Identify course logistics
  • Review the purpose of data visualizations
  • Examine several historic visualizations for their strengths and weaknesses

Introduction to the course

Me (Dr. Benjamin Soltoff)

Course website

Go to https://github.com/uc-cfss/dataviz for the course site. This contains the course objectives, required readings, schedules, slides, etc.

Enrollment in the course

Enrollment in the course is relatively small (10ish students at last count). The nice thing about having a small class is that I can tailor it to better meet your interests. The first six weeks or so of the course are pretty much set, however in the second half of the course we can customize it more to fit your interests and needs. For that reason, I’d like each of you to go to this issue on the course repo and share your thoughts on what you’d like to learn more about in the second half of the term. I have a tentative schedule to which we certainly can stick, but I am open to modifications if there are topics of interest to a substantial portion of the class.

Course objectives

  • Understand how the human mind perceives and interprets visual data
  • Distinguish different types of visualizations and identify appropriate use cases
  • Evaluate visualizations’ interpretability based on experimental design
  • Apply data visualization methods using a reproducible workflow

Purpose of visualizations

A visualization is “any kind of visual representation of information designed to enable communication, analysis, discovery, exploration, etc.”1 However what you seek to communicate can vary widely depending on your goals, and therefore effects the type of visualization you should design.

Information visualization

With information visualization, the goal is to visually depict abstract data that has no inherent physical form, as opposed to scientific visualization whereby the data itself are objects (in 1D, 2D, or 3D space). This data can be numerical (continuous or discrete), categorical, temporal, geospatial, text, etc. The purpose is to convey abstract data accurately, reveal the underlying structure in the data, and (potentially) encourage exploration of the data via an interactive element. Importantly, the visualization should also be aesthetically pleasing.

Statistical graphics

Alternatively, statistical graphics seek to visualize abstract data typically of the quantitative form. The goal is to convey data accurately and reveal the underlying structure, but are generally not explorative and interactive and may not always yield an aesthetically pleasing form.

Examples

Scatterplot matricies

Scatterplot matrix of the Credit dataset. Source: An Introduction to Statistical Learning: With Applications in R.

Scatterplot matrix of the Credit dataset. Source: An Introduction to Statistical Learning: With Applications in R.

Double-time bar charts

Double-time bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart

Double-time bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart

  • Each set of 24 bars show the same data. The top bars run from midnight to 11pm. The bottom bars run from noon to 11am.
  • Highlighted regions represent 6-5 (6am-5pm; 6pm-5am)
  • Colors represent (roughly) day and night (yellow for day, blue for night)
  • Enables representing trends over a 24 hour period without breaking arbitrarily at midnight
Double-time bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart

Double-time bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart

  • Compare different categories of crimes using small multiples (aka facets in ggplot2 language)

Other types of visualizations

Information dashboards

Information dashboards are popular in business and industry. They visualize abstract data, frequently (though not always) over time. The goal is to convey large amounts of information quickly and identify outliers and trends. The downside is that they can become extremely dense.

Examples

Student performance
Fitbit
Fitbit dashboard. Source: me

Fitbit dashboard. Source: me

Infographics

Infographics depict abstract data in an effort to be eye-catching and capture attention, and convey information quickly. Unfortunately they are frequently not accurate, do not use space efficiently, and may not encourage exploration of the data.

Examples

Sun strokes
Extremely sexual sun stroking. Source: The top 10 worst infographics ever created

Extremely sexual sun stroking. Source: The top 10 worst infographics ever created

Mapping paid paternity leave

Informative art

Informative art visualizes abstract data in an effort to make visualization ambient or a part of everyday life. The goal is to aesthetically please the audience, not to be informative.

What makes a good visualization

Dr. John Snow and cholera outbreak in London

Original map made by John Snow in 1854. Cholera cases are highlighted in black. Source: Wikipedia.

At this point in time the theory of bacteria was not widely accepted by the medical community or the public.2 A mother washed her baby’s diaper in a well in 1854 in London, sparking an outbreak of cholera, an intestinal disease that causes vomiting, diarrhea, and eventually death. This disease had presented itself previously in London but its cause was still unknown. Dr. John Snow lived in Soho, the suburb of London where the disease manifested in 1854, and wanted to understand how cholera spreads through a population (an early day epidemiologist). Snow recorded the location of individuals who contracted cholera, including their places of residence and employment. He used this information to draw a map of the region, recording the location of individuals who contracted the disease. They seemed to be clustered around the well pump along Broad Street. Snow used this map to deduce the source of the outbreak was the well, along the way ruling out other causes by noting individuals who lived in the area and did not contract cholera, identifying that these individuals did not drink from the well. Based on this information, the government removed the handle from the well pump so the public could not draw water from it. As a result, the cholera epidemic ended.

  • What makes this a good visualization?
  • One of the earliest examples of statistical visualizations

Minard’s map of Napoleon’s march on Russia

Charles Minard’s 1869 chart showing the number of men in Napoleon’s 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path. Source: Wikipedia.

English translation of Minard’s map

This illustration is identifed in Edward Tufte’s The Visual Display of Quantitative Information as one of “the best statistical drawings ever created”. It also demonstrates a very important rule of warfare: never invade Russia in the winter. In 1812, Napoleon ruled most of Europe. He wanted to seize control of the British islands, but could not overcome the UK defenses. He decides to impose an embargo to weaken the nation in preparation for invasion, but Russia refused to participate. Angered at this decision, Napoleon launched an invasion of Russia with over 400,000 troops in the summer of 1812. Russia is unable to defeat Napoleon in battle, but instead waged a war of attrition. The Russian army was in near constant retreat, burning or destroying anything of value along the way to deny France usable resources. While Napoleon’s army maintained the military advantage, his lack of food and the emerging European winter decimated his forces. He left France with an army of approximately 422,000 soldiers; he returned to France with just 10,000.

Charles Minard’s map is a stunning achievement for his era. It incorporates data across six dimensions to tell the story of Napoleon’s failure. The graph depicts:

  • Size of the army
  • Location in two-dimensions (latitude and longitude)
  • Direction of the army’s movement
  • Temperature on dates during Napoleon’s retreat

What makes this such an effective visualization?3

  • Forces visual comparisons (colored bands for advancing and retreating)
  • Shows causality (temperature chart)
  • Captures multivariate complexity
  • Integrates text and graphic into a coherent whole (perhaps the first infographic, and done well!)
  • Illustrates high quality content (based on reliable data)
  • Places comparisons adjacent to each other (all on the same page, no jumping back and forth between pages)
  • Mimimalistic in nature (avoids what we will later term “chart junk”)

Data maps were one of the first data visualizations, though it took thousands of years after the first cartographic maps before data maps came together.

NYTimes weather summaries

How Much Warmer Was Your City in 2015?

Split into pairs and assess this graphic.

  • What data is related in the visualization? What are the dimensions/variables?
  • Approximately how many data points are recorded in the visualization?
  • What makes this a good/bad visualization?
  • What story is it conveying?